I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore:
About the company
In 2016, Cyclistic launched a successful bike-share oering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. ** Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders **. Although the pricing flexibility helps Cyclistic attract more customers, The director of marketing believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, she believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
To identify differences between Annual Members and casual riders of cyclistic bikes. This will answer the business task question: How do annual members and casual riders use Cyclistic bikes differently?
Insights derived from analysis will drive decision making on whether marketing campaigns should be aimed at getting new members or converting casual riders to annual members.
Please note that ** Divvy’s bike trips dataset (Jan-Dec.2021) ** was used for this project. To download this data set, please use this link[https://divvy-tripdata.s3.amazonaws.com/index.html]. It is also important to note that the company name ‘Cyclistic’ is fictional.
Questions to be asked to proceed with this analysis include:
Data used has to be: * Unbiased * Free from errors (Data was checked for null values and formatted properly) * Original * Reliable * Current ( Latest data was used for this project) * Comprehensive
#loading required libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr) #data wrangling
library(ggplot2) # plotting charts
library(skimr) # to get a detailed info on data
library(readr)
library(tidyr) # for tidy data
library(lubridate) # to format dates
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(geosphere) #to calculate distance in metres between two geographical positions
data1=read_csv('202101-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data13=read_csv('202102-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data14=read_csv('202103-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data4=read_csv('202104-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data5=read_csv('202105-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data6=read_csv('202106-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data7=read_csv('202107-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data8=read_csv('202108-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data9=read_csv('202109-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data10=read_csv('202110-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data11=read_csv('202111-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
data12=read_csv('202112-divvy-tripdata.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
dim(data1)
## [1] 96834 13
dim(data13)
## [1] 49622 13
dim(data14)
## [1] 228496 13
dim(data4)
## [1] 337230 13
dim(data5)
## [1] 531633 13
dim(data6)
## [1] 729595 13
dim(data7)
## [1] 822410 13
dim(data8)
## [1] 804352 13
dim(data9)
## [1] 756147 13
dim(data10)
## [1] 631226 13
dim(data11)
## [1] 359978 13
dim(data12)
## [1] 247540 13
data=rbind(data1,data13,data14,data4,data5,data6,data7,data8,data9,data10,data11,data12)
nrow(data)
## [1] 5595063
#appending all rows since they have same columns
#detailed info of the data
skim_without_charts(data)
| Name | data |
| Number of rows | 5595063 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| numeric | 4 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ride_id | 0 | 1.00 | 16 | 16 | 0 | 5595063 | 0 |
| rideable_type | 0 | 1.00 | 11 | 13 | 0 | 3 | 0 |
| start_station_name | 690809 | 0.88 | 3 | 53 | 0 | 847 | 0 |
| start_station_id | 690806 | 0.88 | 3 | 36 | 0 | 834 | 0 |
| end_station_name | 739170 | 0.87 | 10 | 53 | 0 | 844 | 0 |
| end_station_id | 739170 | 0.87 | 3 | 36 | 0 | 832 | 0 |
| member_casual | 0 | 1.00 | 6 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| start_lat | 0 | 1 | 41.90 | 0.05 | 41.64 | 41.88 | 41.90 | 41.93 | 42.07 |
| start_lng | 0 | 1 | -87.65 | 0.03 | -87.84 | -87.66 | -87.64 | -87.63 | -87.52 |
| end_lat | 4771 | 1 | 41.90 | 0.05 | 41.39 | 41.88 | 41.90 | 41.93 | 42.17 |
| end_lng | 4771 | 1 | -87.65 | 0.03 | -88.97 | -87.66 | -87.64 | -87.63 | -87.49 |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| started_at | 0 | 1 | 2021-01-01 00:02:05 | 2021-12-31 23:59:48 | 2021-08-01 01:52:11 | 4677998 |
| ended_at | 0 | 1 | 2021-01-01 00:08:39 | 2022-01-03 17:32:18 | 2021-08-01 02:21:55 | 4671372 |
Tidy data to be readily available for analysis
#Taking out null values to prevent bias in data
data2<-drop_na(data)
nrow(data2)
## [1] 4588302
#checking data for input errors and inconsistent formats
#ensuring datetime is of the same format in the datetime columns
data2$started_at <- ymd_hms(data2$started_at)
data2$ended_at <- ymd_hms(data2$ended_at)
#check for input errors in character columns using str_length and unique functions
unique(data2$rideable_type)
## [1] "classic_bike" "electric_bike" "docked_bike"
max(str_length(data$ride_id))
## [1] 16
min(str_length(data$ride_id))
## [1] 16
#adding new columns that will be needed for the analysis later
#calculating distance in metres
data2<-data2 %>% mutate(trip_distance=distGeo(matrix(c(data2$start_lng,data2$start_lat), ncol = 2), matrix(c(data2$end_lng, data2$end_lat), ncol = 2)))
#measuring difference between trip start time and end time in secs
data2$triptime_in_secs <- as.numeric(difftime(data2$ended_at, data2$started_at, units ="secs"))
str(data2)
## tibble [4,588,302 x 15] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4588302] "B9F73448DFBE0D45" "457C7F4B5D3DA135" "57C750326F9FDABE" "4D518C65E338D070" ...
## $ rideable_type : chr [1:4588302] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:4588302], format: "2021-01-24 19:15:38" "2021-01-23 12:57:38" ...
## $ ended_at : POSIXct[1:4588302], format: "2021-01-24 19:22:51" "2021-01-23 13:02:10" ...
## $ start_station_name: chr [1:4588302] "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
## $ start_station_id : chr [1:4588302] "17660" "17660" "17660" "17660" ...
## $ end_station_name : chr [1:4588302] "Wood St & Augusta Blvd" "California Ave & North Ave" "Wood St & Augusta Blvd" "Wood St & Augusta Blvd" ...
## $ end_station_id : chr [1:4588302] "657" "13258" "657" "657" ...
## $ start_lat : num [1:4588302] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:4588302] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:4588302] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:4588302] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:4588302] "member" "member" "casual" "casual" ...
## $ trip_distance : num [1:4588302] 2038 1114 2038 2041 2038 ...
## $ triptime_in_secs : num [1:4588302] 433 272 587 537 609 ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
#filtering out trips less than or equal to 0 secs and trips greater than 86400 secs(a day) to prevent bias in analysis
data3<-data2 %>% filter(!(triptime_in_secs<=0 | data2$triptime_in_secs>86400))
dim(data3)
## [1] 4586829 15
#extract month and day from the started_at column
#convert datetime column to date first
data3$trip_date <- as.Date(data3$started_at)
head(data3)
#extract day of week
data3$trip_day <- weekdays(data3$trip_date)
#extract month of the year
data3$trip_month<-strftime(data3$trip_date, '%b')
head(data3)
#order by first day of the week else it will be sorted in alphabetical order
data3$trip_day<-factor(data3$trip_day, levels= c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
data3[order(data3$trip_day), ]
#order by month
data3$trip_month<-factor(data3$trip_month, levels= c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul","Aug","Sep","Oct","Nov","Dec"))
data3[order(data3$trip_month), ]
head(data3)
glimpse(data3)
## Rows: 4,586,829
## Columns: 18
## $ ride_id <chr> "B9F73448DFBE0D45", "457C7F4B5D3DA135", "57C7503...
## $ rideable_type <chr> "classic_bike", "electric_bike", "electric_bike"...
## $ started_at <dttm> 2021-01-24 19:15:38, 2021-01-23 12:57:38, 2021-...
## $ ended_at <dttm> 2021-01-24 19:22:51, 2021-01-23 13:02:10, 2021-...
## $ start_station_name <chr> "California Ave & Cortez St", "California Ave & ...
## $ start_station_id <chr> "17660", "17660", "17660", "17660", "17660", "17...
## $ end_station_name <chr> "Wood St & Augusta Blvd", "California Ave & Nort...
## $ end_station_id <chr> "657", "13258", "657", "657", "657", "KA15040001...
## $ start_lat <dbl> 41.90036, 41.90041, 41.90037, 41.90038, 41.90036...
## $ start_lng <dbl> -87.69670, -87.69673, -87.69669, -87.69672, -87....
## $ end_lat <dbl> 41.89918, 41.91044, 41.89918, 41.89915, 41.89918...
## $ end_lng <dbl> -87.67220, -87.69689, -87.67218, -87.67218, -87....
## $ member_casual <chr> "member", "member", "casual", "casual", "casual"...
## $ trip_distance <dbl> 2037.5917, 1114.0491, 2038.2011, 2040.8390, 2037...
## $ triptime_in_secs <dbl> 433, 272, 587, 537, 609, 1233, 360, 268, 1103, 1...
## $ trip_date <date> 2021-01-24, 2021-01-23, 2021-01-09, 2021-01-09,...
## $ trip_day <fct> Sunday, Saturday, Saturday, Saturday, Sunday, Fr...
## $ trip_month <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan...
#trip day and month have been ordered and are now factors
# lets change the column name member_casual to something more descriptive
data3<-data3 %>% rename(membership_type=member_casual)
Analysis will entail the following to draw necessary insights: * Number of rides taken by each membership type monthly * Number of rides taken per membership type per day of week * Average distance travelled by each membership type per month * Average distance travelled per day of week per membership type * Average time spent cycling by members and casual riders per day of week * Average time spent cycling by members and casual riders per month * Mostly used bike in terms of number of rides * Mostly used bike in the context of average distance travelled * Total number of rides per month
rides_per_day <- data3 %>%
group_by(membership_type, trip_day) %>%
summarise(number_of_rides = n(), .groups = 'drop') %>%
arrange(trip_day) %>%
tidyr::spread(key = membership_type,value = number_of_rides)
print(rides_per_day)
## # A tibble: 7 x 3
## trip_day casual member
## <fct> <int> <int>
## 1 Sunday 403452 311210
## 2 Monday 228781 346474
## 3 Tuesday 214819 388118
## 4 Wednesday 218013 397679
## 5 Thursday 224082 373466
## 6 Friday 289863 365773
## 7 Saturday 468033 357066
# From the analysis above, casual riders utilize Cyclistic bikes mostly on weekends, hence, the #number for these riders while annual members ride mostly on weekdays with a steady increase all #through the week.
rides_per_month <- data3 %>%
group_by(membership_type, trip_month) %>%
summarise(number_of_rides = n(), .groups = 'drop') %>%
arrange(trip_month) %>%
tidyr::spread(key = membership_type,value = number_of_rides)
print(rides_per_month)
## # A tibble: 12 x 3
## trip_month casual member
## <fct> <int> <int>
## 1 Jan 14675 68818
## 2 Feb 8592 34379
## 3 Mar 75551 130045
## 4 Apr 120310 177779
## 5 May 216608 234152
## 6 Jun 303930 304577
## 7 Jul 369207 322892
## 8 Aug 341356 332911
## 9 Sep 292821 328183
## 10 Oct 189029 288851
## 11 Nov 69923 185906
## 12 Dec 45041 131293
#OBSERVATION
#on a monthly basis, number of rides for members exceeded that of casual riders except for months #July and August where no of rides by casual riders exceeded members by 12.5% and 2.4% respectively
#total number of rides by each membership type
number_per_membership <- data3 %>%
group_by(membership_type) %>%
summarize(number_of_rides = n() , .groups = 'drop') %>%
tidyr::spread(key = membership_type,value = number_of_rides)
#Overall rides by members exceeded casual riders by 10.7%
#Let's see by what percentage mmembers rides surpassed casual riders
temptable <- data3 %>%
group_by(membership_type) %>%
summarize(number_of_rides = n() , .groups = 'drop') %>%
tidyr::spread(key = membership_type,value = number_of_rides) %>%
summarise(ratio_to_m=((member-casual)/(member+casual)*100))
monthly_avg_trip_distance <- data3 %>%
group_by(membership_type, trip_month) %>%
summarise(average_trip_dist = mean(trip_distance), .groups = 'drop') %>%
arrange(trip_month) %>%
tidyr::spread(key = membership_type,value = average_trip_dist)
print(monthly_avg_trip_distance)
## # A tibble: 12 x 3
## trip_month casual member
## <fct> <dbl> <dbl>
## 1 Jan 1921. 1922.
## 2 Feb 2016. 1947.
## 3 Mar 2047. 2103.
## 4 Apr 2048. 2143.
## 5 May 2133. 2184.
## 6 Jun 2187. 2195.
## 7 Jul 2218. 2180.
## 8 Aug 2244. 2140.
## 9 Sep 2266. 2105.
## 10 Oct 2193. 1976.
## 11 Nov 2003. 1863.
## 12 Dec 1930. 1858.
#On the average,there is marginal difference between distance covered by casual riders and members #per month. In the month of January, the average distance travelled by both membership types were the #same.
#distance traveled per day of the week per membership type
avg_dist_per_weekday <- data3 %>%
group_by(membership_type, trip_day) %>%
summarise(avg_trip_dist = mean(trip_distance), .groups = 'drop') %>%
arrange(trip_day) %>%
tidyr::spread(key = membership_type,value = avg_trip_dist)
print(avg_dist_per_weekday)
## # A tibble: 7 x 3
## trip_day casual member
## <fct> <dbl> <dbl>
## 1 Sunday 2244. 2187.
## 2 Monday 2066. 2039.
## 3 Tuesday 2094. 2054.
## 4 Wednesday 2119. 2066.
## 5 Thursday 2130. 2047.
## 6 Friday 2166. 2052.
## 7 Saturday 2282. 2187.
avg_ridetime_per_weekday <- data3 %>%
group_by(membership_type, trip_day) %>%
summarise(avg_ride_time = mean(triptime_in_secs), .groups = 'drop') %>%
arrange(trip_day) %>%
tidyr::spread(key = membership_type,value = avg_ride_time)
print(avg_ridetime_per_weekday)
## # A tibble: 7 x 3
## trip_day casual member
## <fct> <dbl> <dbl>
## 1 Sunday 1942. 911.
## 2 Monday 1724. 763.
## 3 Tuesday 1552. 743.
## 4 Wednesday 1463. 747.
## 5 Thursday 1446. 741.
## 6 Friday 1571. 767.
## 7 Saturday 1828. 888.
#Casual riders spend more ride time than members on the average per weekday
#average ride time(in secs) per month
avg_ridetime_per_month <- data3 %>%
group_by(membership_type, trip_month) %>%
summarise(avg_ride_time = mean(triptime_in_secs), .groups = 'drop') %>%
arrange(trip_month) %>%
tidyr::spread(key = membership_type,value = avg_ride_time)
print(avg_ridetime_per_month)
## # A tibble: 12 x 3
## trip_month casual member
## <fct> <dbl> <dbl>
## 1 Jan 1337. 722.
## 2 Feb 1863. 882.
## 3 Mar 1933. 819.
## 4 Apr 1926. 855.
## 5 May 1987. 860.
## 6 Jun 1845. 848.
## 7 Jul 1707. 827.
## 8 Aug 1625. 812.
## 9 Sep 1572. 788.
## 10 Oct 1461. 721.
## 11 Nov 1211. 656.
## 12 Dec 1208. 635.
#creating this temp table to see how much of total time each membership type takes per month
temp1 <- avg_ridetime_per_month %>%
group_by(trip_month) %>%
summarise(total=sum(casual,member),ratio_to_total_c=(casual/total)*100,ratio_to_total_m=(member/total)*100)
print(temp1)
## # A tibble: 12 x 4
## trip_month total ratio_to_total_c ratio_to_total_m
## <fct> <dbl> <dbl> <dbl>
## 1 Jan 2059. 64.9 35.1
## 2 Feb 2744. 67.9 32.1
## 3 Mar 2752. 70.2 29.8
## 4 Apr 2780. 69.3 30.7
## 5 May 2847. 69.8 30.2
## 6 Jun 2692. 68.5 31.5
## 7 Jul 2535. 67.4 32.6
## 8 Aug 2438. 66.7 33.3
## 9 Sep 2359. 66.6 33.4
## 10 Oct 2181. 67.0 33.0
## 11 Nov 1866. 64.9 35.1
## 12 Dec 1843. 65.5 34.5
#on the average, casual riders ridetime is over 100% higher than that for members.
#for seeing the distance travelled for each bike type
dist_travelled_per_bike<-data3 %>%
group_by(rideable_type,membership_type) %>%
summarise(distance_of_ride = mean(trip_distance), .groups = 'drop') %>%
arrange(rideable_type)
#Electric bikes were used for longer hours by both membership types while docked bikes were the least utilized.
#appears to be that annual members were not positively disposed to docked bikes.
#number of times each bike type was used-frequency of usage
frequency_per_bike <- data3 %>%
group_by(rideable_type,membership_type) %>%
summarise(number_of_rides = n(), .groups = 'drop') %>%
arrange(rideable_type) %>%
tidyr::spread(key = membership_type,value = number_of_rides)
#classic bikes were the most used by both membership types; although members used it more than casual riders
#Although classic bikes were more frequently used, electric bikes covered longer distances than #classic bikes
# members used electric and classic bikes more and rarely used docked bikes. Could it be that #thismayhave accounted for the shorter ride time for members? Recall that casual riders had over a #100% longer ride time than annual members.
#which day had the highest number of rides?
day_with_most_rides <- data3 %>%
group_by(trip_day) %>%
summarise(number_of_rides = n(), .groups = 'drop') %>%
arrange(number_of_rides)
#Sunday had the highest number of rides, followed closely by Saturday owing the the surge in the #number of rides taken by casual riders during weekends.
My recommendations are presented below:
Weekend rides have been shown to be more popular among casual riders. Is it possible to provide this set of riders discounts on weekdays in order to encourage them to boost their patronage over the week and eventually convert them into full members?
Both membership categories used electric bikes to travel longer distances. This might be used by Cyclistic to resupply more electric bikes. This might also be used to entice non-members to join as full members.
Members rarely used the docked motorcycles. Is it possible to conduct a poll of annual members to determine why they do not use this rideable mode? Should this particular bike type be phased out entirely? This will also assist the organization in creating personalized marketing strategies based on the survey’s findings.
Since members are more likely to ride the bikes as seen from the analysis, we can provide a referral discount when they renew their subscriptions. This will result in higher member retention as well as more conversions from casual riders to members.
Thank you.